Comparing Entrustable Professional Activity Scores Given by Faculty Physicians and Senior Trainees to First-Year Residents

Introduction Competency by Design (CBD) began on July 1, 2019, for postgraduate year 1 (PGY1) Canadian Core Internal Medicine (CIM) residents. Many entrustable professional activity (EPA) observations allow for assessment by either a faculty physician, senior medicine resident (SMR), or subspecialty resident (SSR). However, few studies exist that compare EPA scores and comments given by faculty vs senior trainees (SMRs and SSRs). This study aimed to identify differences in EPA scores and comments given to PGY1 residents by faculty physicians vs senior trainees. Methods Scores and comments of EPAs completed between July 1, 2019, and June 30, 2020, for 35 CIM PGY1 residents were extracted anonymously from the University of Alberta CBD platform. Scores from faculty vs senior trainees were compared with the Mann-Whitney U test and the Kruskal-Wallis test. Word counts for positive and constructive comments written by faculty vs senior trainees were compared with the independent t-test and one-way ANOVA. The most common two-word phrases in comments were identified with QI Macros software (Denver, CO: KnowWare International, Inc.). Results A total of 2226 EPAs were observed. Faculty physicians gave significantly lower EPA scores overall compared to senior trainees (U = 501706, P <0.001). Constructive comments written by faculty (M = 14.06, SD = 16.84) had lower word counts compared to senior trainees (M = 15.85, SD = 16.43) for overall EPAs (t{2224} = -2.528, P = 0.012). Conclusion Faculty physicians gave lower EPA scores and had lower word counts on constructive comments, compared to senior trainees. These results may help the ongoing implementation of Competence by Design.


Introduction
With the increased attention to competency-based medical education (CBME) over the past few years, the Royal College of Physicians and Surgeons of Canada (RCPSC) recently implemented its version of CBME called Competence by Design (CBD), first formally launching in July 2017 with Anesthesiology and Otolaryngology -Head and Neck Surgery, and Core Internal Medicine (CIM) formally adopting CBD later in July 2019 [1][2][3]. Faculty physicians, senior medicine residents (SMRs), and subspecialty residents (SSRs -a resident who has completed their Core Internal Medicine training {postgraduate years 1-3 "PGY1-3"} and are now completing subspecialty training {PGY4-6}) assess junior residents in performing entrustable professional activities (EPAs), which are the essential tasks of the specialty the resident is training in [4][5][6]. With SMRs and SSRs becoming more involved in assessing their junior colleagues with EPAs, it is possible there are differences that exist between assessments from senior trainees (SMRs and SSRs) and faculty physicians, and if so, it is possible these differences have consequences on junior resident assessment.
Individual EPA observations are scored on a 1-5 entrustment scale adapted from the Ottawa Clinic Assessment Tool; an EPA score of 1 equates to "I had to do", meaning the supervisor had to completely take over the task, and a score of 5 equates to "I didn't need to be there" [7], meaning the learner was able to perform the task competently and safely without the theoretical presence of a supervisor. The assessor can also write positive and constructive comments. These EPA assessments form the basis of resident progression through the four stages of CBD: transition to discipline (TD), foundations of discipline (FD), core of discipline, and transition to practice [6,8].
As outlined by conceptual models from Kogan et al. and Berendonk et al., the cognitive process of assessing 1 1 trainees is complex and prone to influence by multiple factors, including the assessor's characteristics, their frame of reference for standard performance, and the context of the clinical encounter [9,10]. These factors likely influence senior trainees. Additionally, previous studies have looked at differences between faculty and trainee assessors for other modes of assessment, such as objective structured clinical examinations (OSCEs) and workplace-based assessments. There is conflicting evidence about which group is more lenient; some studies show that trainees give higher scores on such assessments than faculty [11][12][13] and other studies show that trainees score lower than faculty [14,15]. However, at this time, there are few studies that compare differences in EPA scores or comments given by faculty physicians, SMRs, and SSRs. If one group is more lenient than another, this could have unintended consequences for learners and programs. For example, is it possible that learners will seek out assessment from a more lenient group? Would this information influence which supervisors are allowed to provide assessment? Does this information influence how programs interpret assessment data?
This study compares the scores and comments for TD and FD EPAs given by faculty physicians vs senior trainees to PGY1 residents in the CIM residency program at the University of Alberta.

EPA scores
Transition to discipline and foundations of discipline EPA scores and comments for 35 University of Alberta CIM PGY1s from July 2019 to June 2020 were extracted from CBME.med, the local CBD electronic platform.

EPA comments: word counts and most common phrases
Word counts for positive and constructive comments written by faculty physicians vs senior trainees were compared with the independent t-test. Word counts for both positive and constructive EPA comments written by faculty physicians vs SMRs vs SSRs were compared with one-way ANOVA, with post-hoc testing done with Tukey's honestly significant difference test (Tukey's HSD). For EPAs FD5 and FD6, SMRs and SSRs were grouped as one choice when selecting the type of observer; thus, EPA scores between faculty physicians vs SMRs vs SSRs could only be compared for EPAs TD1 to FD4b.
QI Macros software (Denver, CO: KnowWare International, Inc.) was used to find the top ten most common two-word phrases for both positive and constructive EPA comments provided by faculty physicians and senior trainees. The University of Alberta Medical Ethics Board approved this project (#Pro00097054). The research was conducted in accordance with the Declaration of Helsinki.

EPA comments: comparing most common phrases
For positive EPA comments, the most common two-word phrases written by faculty physicians were variations of "good job," and the most common two-word phrases written by senior trainees were variations of "good job" and "thorough assessment." For constructive EPA comments, the most common two-word phrases written by faculty were variations of "no concerns" and "read around," and the most common twoword phrases written by senior trainees were variations of "no concerns" and "read around."

Discussion
Overall, faculty physicians gave significantly lower EPA scores compared to senior trainees, and among senior trainees, SSRs gave significantly lower EPA scores than SMRs. This relationship between faculty physicians and senior trainees was present for overall EPA scores, and remained when the TD and FD stages were considered separately. Faculty physicians gave lower scores compared to senior trainees for most individual EPAs as well.
These results support other studies in which medical students and residents gave higher scores than faculty on assessments such as OSCEs and workplace-based assessments [11][12][13]. For example, Hill et al. showed that faculty consultants rated medical students more strictly than specialist registrars [16]. Other studies have shown that assessors with greater seniority and rater experience have stricter scoring tendencies [17,18]. The greater seniority and rater experience of faculty physicians relative to senior trainees may explain why faculty physicians gave lower EPA scores in our study. However, conflicting literature shows that an assessor's rater experience and trainee status do not influence such scores [19,20]. Some studies show that medical students or residents gave lower ratings than faculty physicians when evaluating their peers [14,15]. Despite these discordant studies, our study supports the idea that faculty physicians give stricter ratings than senior trainees and that this effect persists in CBME curricula and EPA scores.
The conceptual frameworks from Kogan et al. and Berendonk et al. describe multiple factors that influence how assessors make judgments and may help explain the differences in EPA scores given by faculty physicians vs senior trainees [9,10]. One major factor is the assessor's frame of reference, which serves as the standard to which junior residents are graded against. Faculty may use their many years of clinical experience as a frame of reference when grading junior residents, with more senior faculty giving harsher ratings [8,16]. Senior trainees are still cultivating their clinical expertise as they progress through their training, and thus may grade junior residents more leniently. Another factor influencing assessors is their individual characteristics, which include academic rank and prior participation in medical education workshops. At the time of our study, the CIM program at our institution had already piloted CBD for two years, and many SMRs had themselves previously participated in CBD as junior residents. This prior experience with CBD serves as an assessor characteristic for these SMRs -they may better understand the practical challenges junior residents face when obtaining EPAs, and may be more sympathetic and lenient with assessments compared to faculty physicians.
The conceptual frameworks also describe the impact of prior relationships between assessor and learner, which alters the social context in which feedback is given and influences rating tendencies [8,9]. For example, a prior positive relationship between an assessor and learner may cause the assessor to fall victim to the "halo effect" and award higher grades. Senior trainees are in an ideal position to develop this kind of positive relationship with junior residents, as they are more accessible to junior residents compared to faculty physicians and are closer to junior residents in training [21][22][23][24][25]. Additionally, senior trainees may be reluctant to provide negative feedback for fear of impairing social relationships with their junior residents [9,26]. The development of such close working relationships can influence senior trainees to give more lenient assessments compared to faculty.
Overall, senior trainees had higher word counts for constructive comments for EPAs compared to faculty physicians. These results are similar to those found by Ringdahl et al, where senior residents were more likely than senior faculty members to write negative comments when evaluating PGY1 residents [27]. Even though this difference between senior trainees and faculty physicians in the word counts of constructive comments is statistically significant, a difference of two words per comment is unlikely to improve the quality of feedback. This is supported by the fact that the most common phrases for both positive and constructive comments were similar between faculty and senior trainees, suggesting little difference in feedback content.
This study does have limitations. We only reviewed EPAs observed for CIM PGY1 residents, and the difference between faculty physicians and senior trainees may not be as prominent in other disciplines. EPA data was only collected from a single institution, and only over one year. We also gathered data from the first year of CBD implementation in CIM programs in Canada. As faculty physicians and senior trainees become more accustomed to the tasks involved with CBD, the differences in EPA scores and comments may change. This study also does not address how important this difference is. While perhaps implied that staff feedback is more accurate, this is also unclear from this study. By allowing senior residents to assess junior residents, do the higher scores assigned truly reflect their performance better or worse than staff assessors? If the scores are more lenient, further study may be critical to determine the impact of these results. Does this mean more junior residents are promoted in the CBD framework without meeting competency? Does this mean junior residents may seek assessment from their senior residents more often than staff assessors? Is further assessor education needed to normalize assessment? While this study does not answer these questions, the results open the opportunity for further review.

Conclusions
Compared to senior trainees, faculty physicians gave significantly lower EPA scores and wrote significantly shorter constructive comments with their EPAs. The next steps for future research include expanding the number of residents involved to include multiple programs and disciplines. Residents from multiple sites should also be studied to examine for more generalizable results. Deeper analysis to determine if other factors impart a role in these results is also important, including the potential role of assessor and trainee gender, age, teaching, and clinical experience, as well as the role of EPA burden. If similar results are identified, educational leaders will need to consider the impact it has on the ongoing rollout of competencybased medical education to ensure residents are being assessed and provided feedback as intended.

Additional Information
Disclosures